2025-01-14
Count models can be used to model outcomes that range from 0 to \(\infty\). These tend to be more common in international relations, but it includes things like:
Number of battle deaths in a conflict
Number of terror attacks in a country
Number of protests in a year
Number of years between coups (although this may be classified as a survival model)
The simplest way to model a count is with a Poisson distribution
A poisson distribution has a single parameter, \(\lambda\) , which represents the average rate of occurrence over a fixed time period.
As with logit models, this is non-linear and solving it requires maximum likelihood methods.
The expected value of \(Y|X\) is
\[ E(Y|X) = e ^{(b_0 + b_1X_1 +b_2X_2...)} \]
poisson_func <- function(x,beta0,beta1) {
theta <- exp(beta0 + beta1*x)
}
curve(poisson_func(x,0,1),xlim=c(-1,5), lwd=2,ylab="poisson_func function", las=1)
curve(poisson_func(x,-2,1),xlim=c(-1,5), lwd=2, lty=2, col="red", add=TRUE)
curve(poisson_func(x,2,1),xlim=c(-1,5), lwd=2, lty=2, col="blue", add=TRUE)
legend(-1,150,c(expression(paste(beta[0]," = 0")),
expression(paste(beta[0]," = -2")),
expression(paste(beta[0]," = 2"))),
lwd=2, lty=1:3, col=c("black","red","blue"))poisson_func <- function(x,beta0,beta1) {
theta <- exp(beta0 + beta1*x)
}
curve(poisson_func(x,3,-2),xlim=c(-1,5), lwd=2,ylab="poisson_func function", las=1)
curve(poisson_func(x,3,0),xlim=c(-1,5), lwd=2, lty=2, col="blue", add=TRUE)
curve(poisson_func(x,3,2),xlim=c(-1,5), lwd=2, lty=2, col="red", add=TRUE)
legend(3,150,c(expression(paste(beta[1]," = 0")),
expression(paste(beta[1]," = -2")),
expression(paste(beta[1]," = 2"))),
lwd=2, lty=1:3, col=c("blue","black", "red"))The poisson distribution assumes that the variance is equal to the mean:
The poisson distribution assumes that the variance is equal to the mean:
But this is rarely true with real world data:
[1] 2.238636
[1] 63.0513
data.frame(dist = dist_poisson(lambda =mean(gtd_counts$nkill)))|>
ggplot(aes(xdist = dist))+
stat_slab(alpha=.7) +
stat_slab(data=gtd_counts, aes(x=nkill),
fill='lightblue',
inherit.aes = FALSE) +
theme_bw() +
labs(title = 'Terrorist attacks in 2018 compared to a poisson distribution with the same mean') Models with more variance than expected are called “overdispersed”, and this causes us to underestimate our standard errors.
This is may be the result of different data generating processes, i.e.: one pattern for states with terrorist insurgencies
Quasipoisson models optimize a different likelihood function that can account for overdispersion.
A random effects model with one random effect per observation
Hurdle or zero-inflated models (in special cases)
A negative binomial model
tibble('a' = dist_negative_binomial(size=1, .5),
'b' = dist_negative_binomial(size=10, .5),
'c' = dist_negative_binomial(size=100, .5)
)|>
ggplot()+
stat_slab(aes(xdist = a), fill='lightblue', alpha=.7) +
stat_slab(aes(xdist = b), fill='lightgreen', alpha=.7) +
stat_slab(aes(xdist = c), fill='orange', alpha=.7) +
theme_bw() | poisson | negative binomial | |
|---|---|---|
| 95% CI in brackets | ||
| liberal democracy score | -1.718 | -0.781 |
| [-2.166, -1.267] | [-3.496, 1.947] | |
| gdp per capita | 0.031 | 0.007 |
| [0.024, 0.038] | [-0.034, 0.051] | |
| any attacks in prior year | 2.888 | 2.544 |
| [2.448, 3.386] | [1.358, 3.723] | |
| Num.Obs. | 176 | 176 |
| AIC | 1525.1 | 413.0 |
| BIC | 1537.8 | 428.8 |
| Log.Lik. | -758.548 | -201.481 |
| F | 63.204 | 7.386 |
| RMSE | 7.60 | 7.69 |
The performance package has a function to check for overdispersion using simulated residuals from a count model. The null hypothesis here is no overdispersion:
# Overdispersion test
dispersion ratio = 18.614
Pearson's Chi-Squared = 3201.581
p-value = < 0.001
We can also see this is addressed by the negative binomial model:
Since the only difference here is the addition of a single dispersion parameter, we can treat these two models as nested and use a likelihood ratio test to determine with the negative binomial model is a better fit. Unsurprisingly, it is:
The negative binomial model and the poisson model both have the same inverse link function, so generating predictions here is the same:
Calculate the linear prediction (plug in the values and multiple each by the coefficient)
Exponentiate the result
Count models are still nonlinear! So, like with logits, the marginal effect of X will depend in part on the other parameters.
Exponentiated coefficients can also be interpreted as the effect of a one unit increase in X on the rate ratio of Y, so a coefficient of 2 means the rate doubles, .5 means its cut in half, and 0 means it stays the same.
modelsummary(model_list,
coef_map = map,
coef_omit = "Intercept",
conf_level = .95,
note = "95% CI in brackets. Exponentiated coefficients",
exponentiate=TRUE
)| poisson | negative binomial | |
|---|---|---|
| 95% CI in brackets. Exponentiated coefficients | ||
| liberal democracy score | 0.179 | 0.458 |
| (0.041) | (0.536) | |
| gdp per capita | 1.032 | 1.007 |
| (0.003) | (0.018) | |
| any attacks in prior year | 17.953 | 12.734 |
| (4.275) | (6.994) | |
| Num.Obs. | 176 | 176 |
| AIC | 1525.1 | 413.0 |
| BIC | 1537.8 | 428.8 |
| Log.Lik. | -758.548 | -201.481 |
| F | 63.204 | 7.386 |
| RMSE | 7.60 | 7.69 |
Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
any_attacks yes - no 3.3055 1.1976 2.760 0.00578 7.4 0.9583 5.6526
e_gdppc +1 0.0143 0.0388 0.370 0.71163 0.5 -0.0616 0.0903
v2x_libdem +1 -1.1807 1.2650 -0.933 0.35066 1.5 -3.6601 1.2987
Type: response
Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
For count data, a poisson model is a starting point, but over-dispersion is so common in real-world data that you should never assume it and you should be a bit skeptical if you a poisson with no discussion of the problem
When in doubt, use a negative binomial model. If there’s no overdispersion, you should get roughly the same results as a poisson.
Ordered logit: response variables with just a few values (for instance, likert responses on a survey)
Multinomial logit: responses for multiple discrete categories
K-1 coefficients for each category of the DV.
Coefficients are the effect on the logged odds of being in a category relative to the baseline group.
Fun fact: the sigmoid function is also used for neural networks